Classifying Syntactic Regularities for Hundreds of Languages
نویسندگان
چکیده
This paper presents a comparison of classification methods for linguistic typology for the purpose of expanding an extensive, but sparse language resource: the World Atlas of Language Structures (WALS) (Dryer and Haspelmath, 2013). We experimented with a variety of regression and nearest-neighbor methods for use in classification over a set of 325 languages and six syntactic rules drawn from WALS. To classify each rule, we consider the typological features of the other five rules; linguistic features extracted from a word-aligned Bible in each language; and genealogical features (genus and family) of each language. In general, we find that propagating the majority label among all languages of the same genus achieves the best accuracy in label prediction. Following this, a logistic regression model that combines typological and linguistic features offers the next best performance. Interestingly, this model actually outperforms the majority labels among all languages of the same family.
منابع مشابه
The Relationship between Syntactic and Lexical Complexity in Speech Monologues of EFL Learners
: This study aims to explore the relationship between syntactic and lexical complexity and also the relationship between different aspects of lexical complexity. To this end, speech monologs of 35 Iranian high-intermediate learners of English on three different tasks (i.e. argumentation, description, and narration) were analyzed for correlations between one measure of sy...
متن کاملCoping with Syntactic Ambiguity or How to Put the Block in the Box on the Table
Sentences are far more ambiguous than one might have thought. There may be hundreds, perhaps thousands, of syntactic parse trees for certain very natural sentences of English. This fact has been a major problem confronting natural language processing, especially when a large percentage of the syntactic parse trees are enumerated during semantic/pragmatic processing. In this paper we propose som...
متن کاملAnalysing Syntactic Regularities in Ontologies
Syntactic regularities are repetitive structures of axioms in the asserted form of an ontology. The Regularity Inspector for Ontologies (RIO) is a framework for detecting such regularities in ontologies using cluster analysis. Detection of syntactic regularities can be used to identify parts of an ontology that have a similar syntactic structure, and could therefore provide an intuition of thei...
متن کاملSyntactic Structures and Rhetorical Functions of Electrical Engineering, Psychiatry, and Linguistics Research Article Titles in English and Persian: A Cross-linguistic and Cross-disciplinary Study
A research article (RA) title is the first and foremost feature that attracts the reader's attention, the feature from which she/he may decide whether the whole article is worth reading. The present study attempted to investigate syntactic structures and rhetorical functions of RA titles written in English and Persian and published in journals in three disciplines of Electrical Engineering, Psy...
متن کاملNative-like Event-related Potentials in Processing the Second Language Syntax: Late Bilinguals
Background: The P600 brain wave reflects syntactic processes in response to different first language (L1) syntactic violations, syntactic repair, structural reanalysis, and specific semantic components. Unlike semantic processing, aspects of the second language (L2) syntactic processing differ from the L1, particularly at lower levels of proficiency. At higher L2 proficiency, syntactic violatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1603.08016 شماره
صفحات -
تاریخ انتشار 2016